92 research outputs found
Algebraic properties of structured context-free languages: old approaches and novel developments
The historical research line on the algebraic properties of structured CF
languages initiated by McNaughton's Parenthesis Languages has recently
attracted much renewed interest with the Balanced Languages, the Visibly
Pushdown Automata languages (VPDA), the Synchronized Languages, and the
Height-deterministic ones. Such families preserve to a varying degree the basic
algebraic properties of Regular languages: boolean closure, closure under
reversal, under concatenation, and Kleene star. We prove that the VPDA family
is strictly contained within the Floyd Grammars (FG) family historically known
as operator precedence. Languages over the same precedence matrix are known to
be closed under boolean operations, and are recognized by a machine whose pop
or push operations on the stack are purely determined by terminal letters. We
characterize VPDA's as the subclass of FG having a peculiarly structured set of
precedence relations, and balanced grammars as a further restricted case. The
non-counting invariance property of FG has a direct implication for VPDA too.Comment: Extended version of paper presented at WORDS2009, Salerno,Italy,
September 200
Higher-Order Operator Precedence Languages
Floyd's Operator Precedence (OP) languages are a deterministic context-free
family having many desirable properties. They are locally and parallely
parsable, and languages having a compatible structure are closed under Boolean
operations, concatenation and star; they properly include the family of Visibly
Pushdown (or Input Driven) languages. OP languages are based on three relations
between any two consecutive terminal symbols, which assign syntax structure to
words. We extend such relations to k-tuples of consecutive terminal symbols, by
using the model of strictly locally testable regular languages of order k at
least 3. The new corresponding class of Higher-order Operator Precedence
languages (HOP) properly includes the OP languages, and it is still included in
the deterministic (also in reverse) context free family. We prove Boolean
closure for each subfamily of structurally compatible HOP languages. In each
subfamily, the top language is called max-language. We show that such languages
are defined by a simple cancellation rule and we prove several properties, in
particular that max-languages make an infinite hierarchy ordered by parameter
k. HOP languages are a candidate for replacing OP languages in the various
applications where they have have been successful though sometimes too
restrictive.Comment: In Proceedings AFL 2017, arXiv:1708.0622
Commutative Languages and their Composition by Consensual Methods
Commutative languages with the semilinear property (SLIP) can be naturally
recognized by real-time NLOG-SPACE multi-counter machines. We show that unions
and concatenations of such languages can be similarly recognized, relying on --
and further developing, our recent results on the family of consensually
regular (CREG) languages. A CREG language is defined by a regular language on
the alphabet that includes the terminal alphabet and its marked copy. New
conditions, for ensuring that the union or concatenation of CREG languages is
closed, are presented and applied to the commutative SLIP languages. The paper
contributes to the knowledge of the CREG family, and introduces novel
techniques for language composition, based on arithmetic congruences that act
as language signatures. Open problems are listed.Comment: In Proceedings AFL 2014, arXiv:1405.527
from regular to strictly locally testable languages
Comment: In Proceedings WORDS 2011, arXiv:1108.341
Aperiodicity, Star-freeness, and First-order Definability of Structured Context-Free Languages
A classic result in formal language theory is the equivalence among
noncounting, or aperiodic, regular languages, and languages defined through
star-free regular expressions, or first-order logic. Together with first-order
completeness of linear temporal logic these results constitute a theoretical
foundation for model-checking algorithms. Extending these results to structured
subclasses of context-free languages, such as tree-languages did not work as
smoothly: for instance W. Thomas showed that there are star-free tree languages
that are counting. We show, instead, that investigating the same properties
within the family of operator precedence languages leads to equivalences that
perfectly match those on regular languages. The study of this old family of
context-free languages has been recently resumed to enhance not only parsing
(the original motivation of its inventor R. Floyd) but also to exploit their
algebraic and logic properties. We have been able to reproduce the classic
results of regular languages for this much larger class by going back to string
languages rather than tree languages. Since operator precedence languages
strictly include other classes of structured languages such as visibly pushdown
languages, the same results given in this paper hold as trivial corollary for
that family too
Algebraic properties of operator precedence languages
This paper presents new results on the algebraic ordering properties of operator precedence grammars and languages. This work was motivated by, and applied to, the mechanical acquisition or inference of operator precedence grammars. A new normal form of operator precedence grammars called homogeneous is defined. An algorithm is given to construct a grammar, called max-grammar, generating the largest language which is compatible with a given precedence matrix. Then the class of free grammars is introduced as a special subclass of operator precedence grammars. It is shown that operator precedence languages corresponding to a given precedence matrix form a Boolean algebra
Toward a theory of input-driven locally parsable languages
If a context-free language enjoys the local parsability property then, no matter how the source string is segmented, each segment can be parsed independently, and an efficient parallel parsing algorithm becomes possible. The new class of locally chain parsable languages (LCPLs), included in the deterministic context-free language family, is here defined by means of the chain-driven automaton and characterized by decidable properties of grammar derivations. Such automaton decides whether to reduce or not a substring in a way purely driven by the terminal characters, thus extending the well-known concept of input-driven (ID) alias visibly pushdown machines. The LCPL family extends and improves the practically relevant Floyd's operator-precedence (OP) languages which are known to strictly include the ID languages, and for which a parallel-parser generator exists
Parallel parsing made practical
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing
- …